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Abstract. We use p- values to identify the threshold level at which a regression function 
takes off from its baseline value, a problem motivated by applications in toxicological and 
^ , pharmacological dose-response studies and environmental statistics. We study the problem 



in two sampling settings: one where multiple responses can be obtained at a number of dif- 
ferent covariate-levels and the other the standard regression setting involving limited number 
of response values at each covariate. Our procedure involves testing the hypothesis that the 
regression function is at its baseline at each covariate value and then computing the poten- 
tially approximate p-value of the test. An estimate of the threshold is obtained by fitting a 
piecewise constant function with a single jump discontinuity, otherwise known as a stump, 
to these observed p-values, as they behave in markedly different ways on the two sides of the 
threshold. The estimate is shown to be consistent and its finite sample properties are studied 
through simulations. Our approach is computationally simple and extends to the estimation 
of the baseline value of the regression function, heteroscedastic errors and to time-series. It 
0^ ■ is illustrated on some real data applications. 
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1. Introduction 



In a number of applications, the data follow a regression model where the regression func- 
tion fi is constant at its baseline value r up to a certain covariate threshold d° and deviates 
significantly from r at higher covariate levels. For example, consider the data shown in the 
left panel of Fig. CD It depicts the physiological response of cells from the IPC-81 leukemia 
rat cell line to a treatment, at different doses; more details are given in Section [321 The 
objective here is to study the toxicity in the cell culture to assess environmental hazards. The 
function stays at its baseline value for high dose levels which corresponds to the dose becom- 
ing lethal, and then takes off for lower doses, showing response to treatment. This problem 
requires procedures that can identify the change-point in the regression function, namely 
where it deviates from the baseline value. The threshold is of interest as it corresponds to 
maximum safe dose level beyond which cell culture s stop responding. Similar problems also 



arise in other toxicological applications (|Coxl . 



1987). 
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Figure 1 . The three data examples. Left panel: Response of cell-cultures 
at different doses. Middle panel: Logratio measurements over range. Right 
panel: Annual global temperature anomalies from 1850 to 2009. 

Problems with similar structure also arise in other pharmacological dose-response stud- 
ies, where fx(x) quantifies the response at dose-level x and is ty pically at the baseline v alue 
up to a certain dose, know n as the minimum effective dose; see IChen & Changl (|2007l) and 
Tamhane & Loganl (|2002l) and the references therein. In such applications, the number of 
doses or covariate levels is relatively smal l, say up to 20, and many procedures propose d 
in the literature are based on testing ideas (jTamhane & Logan , 2002 : Hsu & Berger . 1999 ). 
However, in other application domains, the number of doses can be fairly large compared 
to the number of replicates at each dose. The latter is effectively the setting of a standard 
regression model. In the extreme case, there is a single observation per covariate level. Data 
from such a setting are shown in the middle panel of Fig. [TJ depicting the outcome of a light 
detection and ranging experiment, used to detect the change in the level of atmospheric pollu- 
tants. This techniq ue uses the r e flection of laser-emit t ed lig ht to detect chemical compounds 



in the atmosphere (|Holst et al 



1996 



Ruppert et all 1 1997b . The predictor variable, range, 



is the distance traveled before the light is reflected back to its source, while the response 
variable, logratio, is the logarithm of the ratio of received light at two different frequencies. 
The negative of the slope of the underlying regression function is proportional to mercury 
concentration at any given value of range. The point at which the function falls from its 
baseline level corresponds to an emission plume containing mercury and, thus, is of interest. 
An important difference between these two examples is that the former provides the luxury 
of multiple observations at each covariate level, while the latter does not. 

Another relevant application in a time-series context is given in the right panel of Fig. \T\ 
where annual global temperature anomalies are reported from 1850 to 2009. The study of 
such anomalies, temperature deviations from a base value, has received much attenti on in the 
context of global warming from both the scientific as well as the general community (|Melillo , 
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1999; 



Delworth and KnutsonL 



20001) . The figure suggests an initial flat stretch followed by 



a rise in the function. Detecting the advent of global warming, which is the threshold, is of 
interest here. While we take advantage of the independence of errors in the previous two 
datasets, this application has an additional feature of short range dependence which needs to 
be addressed appropriately. 

Formally, we consider a function p{x) on [0, 1] with the property that fx(x) = To for x < d° 
and fx(x) > t for x > d° for some d° E (0, 1). As already mentioned, quantities of prime 
interest are d° and r that need to be estimated from realizations of the model Y = fx(X) + e. 
We call d° the t threshold of the function fx. Here r is the global minimum for the function 
fx. To fix ideas, we work only with this setting in mind. The methods proposed can be easily 
imitated for the first data application where the baseline stretch is on the right as well as for 
the second data application where r is the maximum. 

In this generality, i.e., without any assumptions on the behavior of the function in a neigh- 
borhood of d°, the estimation of the threshold d° has not been extensively addressed in the 
literature. In the simplest possible setting of the problem posited, fx has a jump discontinuity 
at d°. In this case, d° corresponds to a change-point for fx and the p r oblem reduces to esti - 



mating this c 



hange 



Koul & Oian 



p oint. Such models are well stu d ied; s ee 



(120021) . IPonsI (120031) . 



Lan et al 



(2009), 



Pons 



Mueller] (1 19921) . Loader! (Il996h . 



(|2009l) and the references therein 



Result s on estimating a change-point in a density can be found in llbragimov & Khasminskii 
Jl982l) . 

The problem becomes significantly harder when fx is continuous at d°; in particular, the 
smoother fx is in a neighborhood of d°, the more challenging the estimation. If d° is a cusp 
of fx of some known order p, i.e., the first p — 1 right derivatives of fx at d° equal but the p-th 
does not, so that d° is a change-point in the p-th der i vative , one can obtain no npar ametric 
estim ates for d° using either kernel based (|Muellerl . 1 19920 or wavelet based (IRaimondoL 
1998b methods. If the degree of differentiability of fx at d° is not known, this becomes an 
even harder problem. In fact, it was pointed out to us by one of the referees that if p is 
unknown then there is no method for which the estimate, d, will be uniformly consistent, 
i.e., for any e > 0, sup M P^{\d — d°\ > e} — > 0. Here, the supremum is taken over all choices 
of fx with a t threshold at d°. 

This paper develops a novel approach for the consistent estimation of d° in situations 
where single or multiple observations can be sampled at a given covariate value. The devel- 
oped nonparametric methodology relies on testing for the value of fx at the design values of 
the covariate. The obtained test statistics are then used to construct p-values which, under 
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mild assumptions on p,, behave in markedly different manner on either side of the threshold 
d° and it is this discrepancy that is used to construct an estimate of d°. The approach is 
computationally simple to implement and does not require knowledge of the smoothness of 
p at d°. In a dose-response setting involving several doses and large number of replicates per 
dose, the p-values are constructed using multiple observations at each dose. The approach 
is completely automated and does not require the selection of any tuning parameter. In the 
case of limited or even single observation at each covariate value, referred to as the stan- 
dard regression setting in this paper, the p-values are constructed by borrowing information 
from neighboring covariate values via smoothing which only involves selecting a smoothing 
bandwidth. The first data application falls under the dose-response setting and the other two 
examples fall under the standard regression regime. We establish consistency of the proposed 
procedure in both settings. 

An estimate of p, say p, by itself, fails to offer a satisfactory solution for estimating d°. 
Naive estimates, using p, may be of the form d^ = sup{x : p(x) < t } or d^ = inf {x : 
p(x) > r }. The estimator d^ performs poorly when p is not monotone, and is close to 
t at values to the far right of d°, e.g., when p is tent-shaped. Also, S~ 2 \ by itself, is not 
consistent and one would typically need to substitute t with a To + rj n , with i] n — > at an 
appropriate rate, to attain consistency. In contrast, our approach does not need to introduce 
such exogenous parameters. 

2. Formulation and Methodology 

2.1. Problem Formulation. Consider a regression model Y = p(X) + e, where p is a 
function on [0, 1] and 

(j,(x) = t (x < d°), p(x) > t (x > d°), (1) 

for o? G (0, 1), with an unknown r G E. The covariate X is sampled from a Lebesgue 
density / on [0, 1] and E(e \ X = x) = 0, a 2 (x) = var(e | X = x) > for x G 
[0, 1]. We assume that / is continuous and positive on [0, 1] and \i is continuous. No further 
assumptions are made on the behavior of fi, especially around d°. We have the following 
realizations: 

Y i:j = /i(Xi) + €ij (i = 1, . . . ,n; j = 1, . . . ,m), (2) 

with N = m x n being the total budget of samples. The e^s are independent given X 
and distributed like e and the XiS are independent realizations from /. Also, © with m = 
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1 corresponds to the usual regression setting which simply has only one response at each 
covariate level. 

We construct consistent estimates of d° under dose-response and standard regression set- 
tings. In the dose-response setting, we allow both m and n to be large and construct p- values 
accordingly. We refer to the corresponding approach as Method 1 from now on. In the other 
setting, we consider the case when m is much smaller compared to n and extend our ap- 
proach through smoothing. We refer to this extension as Method 2, which requires choosing 
a smoothing bandwidth. The two methods rely on the same dichotomous behavior exhibited 
by the approximate p- values, although constructed differently. 

2.2. Dose-Response Setting (Method 1). We start by introducing some notation. Let %. = 
Y^Li Yij/ m an d x E (0, 1) denote a generic value of the covariate. Let <j m n = a and 
f m>n = f denote the estimators of a(-) and t respectively. For homoscedastic errors, <3- m ri (-) 
is the standard pooled estimate, i.e., <r^ re (x) = J2i jC^ij ~ Yi) 2 /{nm — m), while for the 
heteroscedastic case <r^ n (X;) = J2J=i(Yij~ ^-)V( m ~ !)• Estimators of t are discussed in 
Section |2~4| We seek to estimate d° by constructing p-values for testing the null hypothesis 
Hq :X : = t against the alternative H l x : > r at each dose Xi = x. The 

approximate p-values are 



Indeed, these approximate p- values would correspond to the exact p-values for the uniformly 
most powerful test if we worked with a known a, a known r and normal errors. 

To the left of d°, the null hypothesis holds and these approximate p-values converge 
weakly to a Uniform(0,l) distribution, for suitable estimators of r . In fact, the distribu- 
tion of p m , n (Xi)s does not even depend on Xi when Xi < d°. Moreover, to the right of d°, 
where the alternative is true, the p-values converge in probability to 0. This dichotomous 
behavior of the p- values on either side of d° can be used to prescribe consistent estimates of 
the latter. We can fit a stump, a piecewise constant function with a single jump discontinuity, 
to the p m „(Xj)s, i — 1, . . . , n, with levels 1/2, which is the mean of a Uniform (0,1) random 
variable, and on either side of the break-point and prescribe the break-point of the best 
fitting stump (in the sense of least squares) as an estimate of d°. Formally, we fit a stump of 
the form ^(x) = (l/2)l(x < d), minimizing 



Pm,n( X i) = Pm,n{ X h T m ,n) = 1 ~ ^{m 



2 (Y,-f)/a(Xi)}. 



(3) 
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over d E [0,1]. Let d m ^ n = argmin de [ 0) i] M m , n (d). The success of our method relies 
on the fact that the p m n (Xj)s eventually show stump like dichotomous behavior. In this 
context, no estimate of fx could exhibit such a behavior directly. Our procedure can be 
thought of as fitting the limiting stump model to the observed Pm,n(Xi)s by minimizing an 
L 2 norm. In fact, the expression in ([3]) can be simplified, and it can be seen that d m>n = 
argmax d6[0) i]M m)ri ((i), where M nhn (d) = n" 1 ^ i:Xi < (i K,r l (^) - 1/4}. The estimate 
can be computed easily via a simple search algorithm as it is one of the order statistics. 

In heteroscedastic models, the estimation of the error variance &(■) can often be tricky. The 
proposed procedure can be modified to avoid the estimation of the error variance altogether 
for the construction of the p-values, as the desired dichotomous behavior of the p-values 
is preserved even when we do not normalize by the estimate of the variance. Thus, we 
can consider the modified p-values pm,n{^i) = 1 — ${m 1 / 2 (F i . — f)} and the dichotomy 
continues to be preserved as E{1 — ®(Z)} = 0.5 for a normally distributed Z with zero 
mean and arbitrary variance. In practice though, we recommend, whenever possible, using 
the normalized p-values as they exhibit good finite sample performance. 

Next, we prove the consistency of our proposed procedure when using the unnormalized 
p-values. The technique illustrated here can be carried forward to prove consistency for 
other variants of the procedure, e.g., when normalizing by the estimate of the error vari- 
ance, but require individual attention depending upon the assumption of heteroscedastic- 
ity/homoscedasticity. 

Theorem 1. Consider the dose-response setting of the problem and let d m>n denote the esti- 
mator based on the non-normalized version ofp-values, e.g., p m , n (Xj) = 1 — ${m 1 / 2 (F i . — 
f)}. Assume that m 1 / 2 (f mjn — r ) = o p (l) as m,n oo, i.e., given e,r) > 0, there 
exists a positive integer L, such that for m,n > L, P(m 1 / 2 |f — r | > e) < r\. Then, 
dm,n — d° = o p (l) as m,n — >■ oo. 

2.3. Standard Regression Setting (Method 2). We now consider the case when m is much 
smaller than n. Let fx(x) = f(x)/f(x) denote the Nadaraya- Watson estimator, where 

r(x) = (nhny^ti^-K {h~\x - X, t )} md f(x) = K)-'^ =1 K {K~\x - X t )}, 
with K being a symmetric probability density or simply a kernel and h n the smoothing 
bandwidth. We take h n = cn~^ for f3 G (0, 1). Let cr n (-) and f n denote estimators of cr(-) 
and r respectively. An estimate of <r 2 (-) can be constructed through standard techniques, 
e.g., smoothing or averaging the squared residuals m{Pj. — fx(Xi)} 2 , depending upon the 
assumption of heteroscedastic or homoscedastic errors. 



Threshold estimation 7 

For x < dP, the statistic T(x, r ) = (nh n ) l l 2 (fi(x) — r ) converges to a normal distribution 
with zero mean and variance V 2 (x) = a 2 (x)K 2 /{mf(x)} with K 2 = J K 2 {u)du. The 
approximate p-value for testing H 0jX against Hx )X can then be constructed as: 

p n (x) = p n (x,T n ) = 1 - $ |T(x, T n )/V n (x) J , 

where V^(x) = a 2 (x)K 2 /{m/(x)}. It can be seen that these p-values also exhibit the 
desired dichotomous behavior. Finally, an estimate of dP is obtained by maximizing 

M n (d) = (1/n) - 1/4} (4) 

i:Xi<d 

over d G [0, 1]. Let d n = argmax rfe [ ,i]M n (cf). Under suitable conditions on f n , this 
estimator can be shown to be consistent when n grows large. 

We have avoided sophisticated means of estimating /x(-), as our focus is on estimation 
of d°, and not particularly on efficient estimation of the regression function. Also, the 
Nadaraya-Watson estimate does not add substantially to the computational complexity of 
the problem and provides a reasonably rich class of estimators through choices of band- 
widths and kernels. 

In many applications, particularly when m — 1 and under heteroscedastic errors, estimat- 
ing the variance function a 2 (-) accurately could be cumbersome. As with Method 1, Method 
2 can also be modified to avoid estimating the error variance, e.g., the estimator constructed 
using ©, based on p n (Xj)s, with p n (x) = 1 — $ {(r?,/i n ) 1 / 2 (/i(a;) — f n )}. Next, we prove 
consistency for the proposed procedure when we do not normalize by the estimate of the vari- 
ance. The technique illustrated here can be carried forward to prove consistency for other 
variants of the procedure. We make the following additional assumptions. 

(a) For some rj > 0, the functions a 2 {-) and a^\x) = E(\e\ 2+r i | X = x), x e [0, 1], 
are continuous. 

(b) The kernel K is either compactly supported or has exponentially decaying tails, 
i.e., for some C, D and a > 0, and for all sufficiently large x, P{\W\ > x} < 
Cexp(— Dx a ), where W has density K. Also, K 2 = j K 2 {u)du < oo. 

Assumption (a) is very common in non-parametric regression settings for justifying asymp- 
totic normality of kernel based estimators. Also, the popularly used kernels, namely uniform, 
Gaussian and Epanechnikov, do satisfy assumption (b). 
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Theorem 2. Consider the standard regression setting of the problem with m staying fixed 
and n — > oo. Assume that (nh n ) l l 2 {r n — r ) = o p (l) as n — > oo. Le£ d n denote the estimator 
computed using p n (Xj) = 1 — ${T(Xj, f n )}. TTzen, d n — d° = o p (i) as n — >■ oo. 

Remark 1. The model in © incorporates the situations with discrete responses. For example, 
we can consider binary responses with Y^s indicating a reaction to a dose at level Xi . We 
assume that the function p{x), the probability that a subject yields a reaction at dose x, is 
of the form © and takes values in (0, 1) so that a 2 (x) = p(x){l — p(x)} > 0. The results 
from this section as well as those from Section [2T2l will continue to hold for this setting. 

Remark 2. Our assumption of continuity of p can be dropped and the results from this section 
as well as those from Section 12.21 will continue to hold provided that p is bounded and 
continuous almost everywhere with respect to Lebesgue measure. This includes the classical 
change-point problem where p has a jump discontinuity at d° but is otherwise continuous. 

2.4. Estimators of r . Suitable estimates of r are required that satisfy the conditions stated 
in Theorems [Hand [21 In a situation where d° may be safely assumed to be greater than some 
known positive 77, an estimate of t can be obtained by taking the average of the response 
values on the interval [0, 77] . The estimator would be (nm) ^-consistent and would therefore 
satisfy the required conditions. Such an estimator is seen to be reasonable for most of the 
data applications that are considered in this paper. In situations when such a solution is not 
satisfactory, we propose an approach to estimate r that does not require any background 
knowledge, once again using p-values. 

We now construct an explicit estimator f of r in the dose-response setting, as re- 
quired in Theorem [H using p-values. For convenience, let Z im (r) = p m>n (Xj,r) = 
1 — $ {m 1 / 2 (F i . — T)/<7 m>n (Xj)}. Let r > r . Asm increases, for p(Xi) < r, Z im (r) 
converges to 1 in probability, while for p(Xi) > r, Zi m (r) converges to in probability. 
For any r < r , it is easy to see that Z im (r) always converges to 0, whereas when r = t , 
Zimij) converges to for Xi > d° and E{Z im (T)} converges to 1/2 for Xi < d°. Thus, it is 
only when r = r that Z im {r)% are closest to 1/2 for a substantial number of observations. 
This suggests a natural estimate of To: 

n 

t = T m , n = argmm^{2 im (r) - 1/2} 2 . (5) 

i=l 

Theorem [3] shows that under some mild conditions and homoscedasticity, m 1 / 2 (f m n — r ) is 
o p (l), a condition required for Theorem [T] This proof is given in Supplementary Material 1. 
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Theorem 3. Consider the same setup as in Theorem [7] Assume that the errors are ho- 
moscedastic with variance <Jq. Further suppose that the regression function \i satisfies: 

(A) Given r) > 0, there exists e > such that, for every r > r , 

I{ X > d 0: Mx )-r\<e}f( X ) dx < V- 

Also assume that (f> m , the density function ofm 1 / 2 ei ./<7o> converges pointwise to (j), the stan- 
dard normal density. Then m 1 / 2 (f m n — r ) = o p (l). 

Remark 3. Condition (A) is guaranteed if, for example, /i is strictly increasing to the right of 
d° although it holds under weaker assumptions on p. In particular, it rules out flat stretches to 
the right of d°. The assumption that <p m converges to is not artificial, since convergence of 
the corresponding distribution functions to the distribution function of the standard normal 
is guaranteed by the central limit theorem. 

This approach in © can also be emulated to construct estimators of r for the standard re- 
gression setting by just going through the procedure with p n {X i) r)s instead of p mjn (Xi, r)s 
and it is clear that this estimator is consistent. However, the theoretical properties of this 
estimator, such as the rate of convergence, are not completely known. Nevertheless, the 
procedure has good finite sample performance as indicated by the simulation studies in Sec- 
tion [3] The estimator is positively biased. This is due to the fact that a value larger than r is 
likely to minimize the objective function in © as it can possibly fit the p- values arising from 
a stretch extending beyond [0, d°], in presence of noisy observations. The values smaller than 
r do not get such preference as the true function never falls below t . 

2.5. To smooth or not to smooth. The consistency of the two methods established in the 
previous sections justifies good large sample performance of the procedures, but does not 
provide us with practical guidelines on which method to use given a real application. In dose- 
response studies, it is quite difficult to find situations where both m and n are large. Typically, 
such studies do not administer too many dose levels which precludes n from being large. So, 
we compare the finite sample performance of the two methods for different allocations of m 
and n to highlight their relative merits. 

We study the performance of the two methods for three different choices of regression 
functions. All these functions are assumed to be at the baseline value to the left of d° = 0.5. 
Specifically, M 1 is a piece-wise linear function rising from to 0.5 between d° and 1; M 2 , a 
convex curve, grows like a quadratic beyond d°, and reaches 0.5 at 1; M 3 rises linearly with 
unit slope for values ranging from d° to 0.8 and then decreases with unit slope for values 
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between 0.8 and 1.0. So, Mi and M 2 are strictly monotone to the right of d° and exhibit 
increasing level of smoothness at d°. On the other hand, M 3 is tent-shaped and estimating d° 
is expected to be harder for M 3 compared to M\. 

For each allocation pair (m, n) and a choice of a regression function, we generate re- 
sponses {Yii, . . . , Y im }, with Yij = + e^, the e^s being independent N(0, a 2 ) with 
a = 0.3. The XiS are sampled from Uniform(0,l). The performance for estimating d° = 0.5 
is studied based on root mean square error computed over 2000 replicates, assuming a known 
variance and a known t = 0. For illustrative purposes, we use the Gaussian kernel for 
Method 2. Based on heuristic computations, a bandwidth of the form h n = cn~ 1 ^ 2p+l " 1 is 
chosen as it is expected to attain the minimax rate of convergence for estimating a cusp of 
order p, as per lRaimondol (I1998|) . For Mi and M 3 , p = 1 while for M 2 , p is 2. We report the 
simulations for the best c which minimizes the average of the root mean square errors for the 
sample sizes considered, over a fine grid. 

There are results in the literature which suggest a possibly differ ent minim a x rate 
of convergence based on calculations in a slightly different model (|Neumannl . 



1997 



Goldenshluger et all 120061) and hence a possibly different choice of optimal bandwidth. But 



not much improvement was seen in terms of the root mean square errors for other choices of 
bandwidth. 

The root mean square errors and the biases for each allocation pair are given in Table 1. 
Both procedures are inherently biased to the right as the p-values are not necessarily close 
to zero to the immediate right of d°. When m and n are comparable, e.g., m < 15 and 
n < 15, Method 2, which relies on smoothing, does not perform well compared to Method 
1. However, when m is much smaller than n, e.g., m = 4 and n = 80, smoothing is efficient 
and Method 2 is preferred over Method 1 . When both m and n are large, both methods work 
well. As Method 1 does not require selecting any tuning parameter, we recommend Method 
1 in such situations. 



2.6. Extension to Dependent Data. The global warming data falls under the standard re- 
gression setup, but involves dependent errors. Moreover, the data arises from a fixed design 
setting, with observations recorded annually. Here, we discuss the extension of Theorem [2] 
in this setting. Under fixed uniform design, we consider the model Y^ n = fj,(i/n) + e^ n (i = 
1, . . . , n). Under such a model, Y i>n and e i; „ must be viewed as triangular arrays. The es- 
timator of the regression function is jl(x) = (n/i n ) _1 Y ijU K {h^^x — i/n)}. For each 
n, we assume that the process is stationary and exhibits short-range dependence. Under 
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Table 1. Root mean square errors (xlO 2 ) and biases (xlO 2 ), the first and 
second entries respectively, for the estimate of threshold d° obtained using 
Methods 1 and 2, for the three models with a = 0.3 and different choices of 
m and n. 





M l 


M 2 


M 3 


(m, n) 


Method 1 


Method 2 

(0.04n" 1 /3) 


Method 1 


Method 2 

(0.08n -1 /5) 


Method 1 


Method 2 

(0.04n" 1 / 3 ) 


(5,5) 


16.9, 4.5 


18.0, 9.6 


20.2, 11.6 


21.8, 10.9 


20.5, 7.5 


23.7, 14.3 


(5, 10) 


15.7, 6.7 


16.6, 9.1 


21.8, 17.2 


21.3, 11.5 


20.1, 10.9 


20.8, 12.4 


(10, 10) 


13.4, 3.3 


14.1, 5.6 


19.0, 13.9 


19.3, 8.6 


14.9, 4.6 


15.6, 6.9 


(10, 15) 


11.8,4.9 


12.6, 5.2 


18.7, 15.5 


19.0, 7.8 


12.2, 5.3 


12.9, 5.8 


(10, 20) 


10.8, 6.2 


10.9, 4.6 


18.5, 16.7 


17.6, 6.9 


10.9, 6.4 


11.0, 4.9 


(15, 10) 


12.5, 1.8 


12.6, 4.0 


17.7, 11.7 


18.4,7.0 


13.5, 2.0 


13.2, 4.6 


(15, 15) 


10.4, 3.8 


10.9, 4.0 


17.2, 14.0 


17.5, 6.6 


10.9, 3.8 


11.2, 3.8 


(15, 20) 


9.4, 4.2 


9.8,3.8 


17.0, 14.9 


17.4, 5.9 


9.2, 4.4 


10.0, 3.6 


(20, 10) 


12.4, 1.0 


12.3, 2.9 


16.5, 11.2 


17.5, 6.5 


12.7, 0.7 


12.3, 3.9 


(20, 15) 


10.2, 2.5 


10.6, 2.5 


16.2, 13.3 


17.0, 5.8 


10.3, 2.6 


10.6, 2.7 


(20, 20) 


8.9, 3.3 


9.7, 2.3 


15.9, 13.9 


16.1,5.4 


8.7, 3.6 


9.3, 2.7 


(3, 80) 


16.2, 14.5 


10.5, 8.0 


26.9, 26.2 


16.4, 9.3 


19.7, 16.6 


11.0, 8.3 


(3, 100) 


16.2, 14.6 


9.9, 7.7 


27.0, 26.5 


15.9, 8.9 


18.7, 15.9 


9.8, 7.4 


(4, 80) 


14.1, 12.4 


9.4, 6.9 


24.8, 24.2 


15.7, 8.6 


15.0, 12.9 


9.8, 6.8 


(4, 100) 


14.0, 12.5 


8.8, 6.3 


24.9, 24.4 


14.8, 7.8 


14.4, 12.5 


8.7, 6.3 



Assumptions 1-5, listed in iRobinsonl (| 1 997f) . it can be shown that {nh n ) l l 2 {jl(xk) — [i(xk)}, 
Xk G (0, l),k = 1,2 and x\ ^ X2, converge jointly in distribution to independent nor- 
mals with zero mean. In this setting, the working p- values, defined here to be pn\x, r ) = 
1 — ${(n/i n ) 1 / 2 (/i(a;) — r )}, still exhibit the desired dichotomous behavior. To keep the 
approach simple, we have not normalized by the estimate of the variance as this would have 
involved estimating the auto-correlation function. The conclusions of Theorem |2] can be 
shown to hold when d n is constructed using © based on pn{X i} f)s. Here, f is constructed 
via averaging the responses over an interval that can be safely assumed to be on the left of 
d°, as discussed in Section [2~4l 
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Table 2. Root mean square errors (xlO 2 ) and biases (xlO 2 ), the first and 
second entries respectively, for the estimate of threshold d° obtained using 
Method 1 and the estimate of t with a = 0.3 for the three models. 



(m, n) 


Mi 


M 2 


M 3 


d° 


TO 


d° 


TO 


d° 


TO 


(5,5) 


25.5,21.5 


17.5,9.9 


28.2, 25.5 


13.4, 6.0 


31.2, 26.2 


14.2, 8.4 


(5, 10) 


24.8, 20.5 


14.3, 8.6 


27.1, 22.3 


10.2, 4.9 


30.3, 24.3 


11.2,7.2 


(10, 10) 


20.7, 15.7 


12.4, 6.7 


24.6, 21.6 


7.7, 3.5 


27.2, 21.5 


10.4, 6.9 


(10, 20) 


17.2, 13.9 


9.0, 5.2 


24.0, 22.4 


5.4, 2.9 


24.8, 19.8 


8.6, 6.2 


(10, 50) 


13.6, 12.1 


5.6, 3.8 


23.5, 22.8 


3.8,2.7 


18.6, 15.7 


7.0, 5.8 


(20, 50 ) 


9.0, 7.6 


3.1, 1.8 


19.4, 18.7 


2.5, 1.7 


12.4, 10.0 


5.0, 3.4 


(50, 100 ) 


5.0, 4.3 


1.1,0.7 


15.2, 14.8 


1.2, 0.9 


5.2, 4.6 


1.4, 0.9 



3. Simulation Results and Data Analysis 

3.1. Simulation Studies. We consider the same three choices of the regression function Mi , 
M 2 and M 3 , as in Section [231 The data are generated for allocation pair (m, n) and a choice 
of regression function, with the errors being independent iV(0, a 2 ), where a = 0.3. The Xts 
are again sampled from Uniform(0,l). We study the performance of the two methods when 
the estimates of d° are constructed using p-values that are normalized by their respective 
estimates of variances. 

Firstly, we consider Method 1 . In Table 2, we report the root mean square error and the 
bias for the estimators of d° and r , for different choices of m and n. For moderate sample 
sizes, M 3 shows greater root mean square errors in general than Mi and M 2 as the signal is 
weak close to 1 for M 3 . For large sample sizes, the performance of the estimate is similar 
for Mi and M 3 and is better than that for M 2 , which can be ascribed to M 2 being smoother 
at d°. The procedure is inherently biased to the right as p-values are not necessarily close to 
zero to the immediate right of d°. Further, the estimator, on average, moves to the left with 
increase in m as the desired dichotomous behavior becomes more prominent. 

Next, we study the performance of Method 2. As the estimation procedure is entirely 
based on { (X h Fj.) }" =1 , without loss of generality, we take m to be 1 . We again work with the 
Gaussian kernel with the smoothing bandwidth chosen in the same fashion as in Section [231 
In Table 3, we report the root mean square error and the bias for the two estimators, for 
different choices of m and n. We see trends similar to those for Method 1, across the choices 
of the regression functions. 
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Table 3. Root mean square errors (xlO 2 ) and biases (xlO 2 ), the first and 
second entries respectively, for the estimate of threshold d° obtained using 
Method 2 and the estimate of t with a = 0.3 for the three models. 





Mi 


M 2 


M 3 


n 


h n = O.ln" 1 / 3 


hn = O.lSn" 1 / 5 


h n = O.ln -1 / 3 




d° 


TO 


d° 


TO 


d° 


TO 


20 


28.5, 17.9 


20.9, 10.5 


29.0, 17.8 


14.7, 5.7 


32.6, 22.4 


17.4, 8.4 


30 


26.8, 15.5 


18.4, 9.4 


26.8, 14.6 


12.2, 3.8 


31.9, 21.8 


15.1,7.4 


50 


23.7, 13.8 


15.8, 8.0 


24.4, 12.4 


9.9, 3.1 


28.4, 18.7 


13.1,6.9 


80 


21.5, 11.2 


13.7, 6.6 


22.2, 8.4 


7.8, 1.9 


27.0, 17.8 


11.7, 6.8 


100 


19.5,9.6 


12.5, 5.3 


21.6, 8.2 


7.5, 1.7 


25.1, 14.7 


10.9, 6.1 


200 


15.9, 6.2 


8.8,3.5 


19.1,6.0 


4.9, 1.1 


21.0, 12.2 


9.2, 5.3 


500 


10.4, 0.6 


4.6, 1.4 


16.4, 3.9 


2.7, 0.5 


14.2, 5.4 


6.0, 2.5 


1000 


9.5, 0.4 


3.1,0.7 


15.0, 2.0 


2.0, 0.4 


10.5,2.1 


3.9, 1.2 


1500 


8.5,0.3 


2.3, 0.5 


14.8, 1.5 


1.8, 0.3 


8.8,0.8 


2.8, 0.8 


2000 


7.2, 0.2 


2.0, 0.5 


13.8,0.7 


1.5, 0.2 


8.1,0.1 


2.3, 0.5 



We studied the performance of the estimates under settings where d° is closer to the bound- 
ary of [0, 1]. Optimal allocation pairs (m, n) were also computed for a given model and a 
fixed budget N = m x n. These details are skipped here but can be found in Section 5.1 of 
the Supplementary Material 1 . We also compared Method 1 to some competing procedures 
developed in the pharmacologic al dose-re s ponse s etting to identify the minimum effe ctive 
dose, namely the approac hes in Iwilliamsl (1197 lb . iHsu & Bergerl (|1999|) . 



Chen (1999) and 



Tamhane & Loganl (|2002l) . Method 1 was seen to perform well in comparison with these 



methods. For more details, see Section 5.2 of the Supplementary Material 1. 

Based on our simulation study, including results not shown here due to space considera- 
tions, the following practical recommendations are in order. In terms of optimal allocation 
under a fixed budget N, it is better for one to invest in an increased number of covariate val- 
ues n, rather than replicates m. In the case where the threshold d° is closer to the boundaries, 
investment in n proves fairly important. Further, when the sample size is reasonably large, 
the procedure that avoids estimating the variance function and works with non-normalized 
p-values, is competitive and is recommended in the regression settings with heteroscedastic 
errors and time- series. 
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3.2. Data Applications. The first data application deals with a dose-response experiment 
that studies the effect on cells from the IPC-81 leukemia rat cell line to treatment with 1- 



methyl- 3 -buty limidazo 



ium t etrafluoroborate, at different doses measured in /jM, micro mols 



per liter (|Ranke et all 120041) . The substance treating the cells is an ionic liquid and the 
objective is to study its toxicity in a mammalian cell culture to assess environmental hazards. 
The question of interest here is at what dose level toxicity becomes lethal and cell cultures 
stop responding. 

It can be seen from the physiological responses shown in the left panel of Fig. [T] that 
there is a decreasing trend followed by a flat stretch. Hence, it is reasonable to postulate a 
response function that stays above a baseline level r until a transition point d° beyond which 
it stabilizes at its baseline level. We assume errors to be heteroscedastic, as the variability 
in the responses changes with level of dose, with more variation for moderate dose levels 
compared to extreme dose levels. This is the small (m, n) case with m and n being compa- 
rable; in fact, m = n = 9. Hence we apply Method 1 to this problem. The estimate of t 
was constructed using the procedure based on p- values as described in Section [2~4l We get 
f = 0.0286 with the corresponding d = 5.522 log fiM, the third observation from right. We 
believe that this is an accurate estimate of d°, since the cell-cultures exhibit high responses 
at earlier dose levels and no significant signal to the right of the computed d. 

The second example, as discussed in the introduction, involves measuring mercury con- 
centration in the atmosphere through the light detection and ranging technique. There are 
221 observations with the predictor variable range varying from 390 to 720. As supported 
by the middle panel of Fig. [TJ the underlying response function is at its baseline level fol- 
lowed by a steep descent, with the point of change being of interest. There is evidence of 
heteroscedasticity and hence, we employ Method 2 without normalizing by the estimate of 
the variance. It is reasonable to assume here that till the range value 480 the function is at 
its baseline. The estimate of r is obtained by taking the average of observations until range 
reaches 480, which gives f = —0.0523. The estimates d, computed for bandwidths vary- 
ing from 5 to 30, show a fairly strong agreement as they lie between 534 and 547, with the 
estimates getting bigger for larger bandwidths. The cross-validated optimal bandwidth for 
regression is 14.96 for which the corresponding estimate of d° is 541. 

The global warming data contains global temperature anomalies, measured in degree Cel- 
sius, for the years 1850 to 2009. These anomalies are temperature deviations measured with 
respect to the base period 1961-1990. The data are modeled as described in Section [2761 
As can be seen in the right panel of Fig. [TJ the function stays at its baseline value for a 
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while followed by a non-de creasing trend. The flat stretch at the beginning is also noted in 



Zhao and Woodroofd (|2011|) where isotonic estimation procedures are considered in settings 
with dependent data. The estimate of the baseline value, after averaging the anomalies up to 
the year 1875, is f = —0.3540. With the dataset having 160 observations, estimates of the 
threshold were computed for bandwidths ranging from 5 to 30. The estimates varied over 
a fairly small time frame, 1 916-1921. This is consistent with the observation on page 2 of 



Zhao and Woodroofd (120111) that global warming does not appear to have begun until 1915. 
The optimal bandwidth for regression obtained through cross-validation is 13.56, for which 
d is 1920. 

3.3. Extensions. Here we discuss some of the possible extensions of our proposed proce- 
dure. 

(i) Fixed design setting: Although the results in this paper have been proven assuming 
a random design, they can be easily extended to a fixed design setup. Consistency of the 
procedures will continue to hold. 

(ii) Unequal replicates: In this paper, we dealt with the case of a balanced design with 
a fixed number of replicates m for every dose level Xj. The case of varying number of 
replicates m, can be handled analogously. In the dose-response setting, Theorem \T\ will 
continue to hold provided the minimum of the m^s goes to infinity. In the standard regression 
setting, Theorem[2]can also be generalized to the situation with unequal number of replicates 
at different doses. 

(Hi) Adaptive stump model: The use of 1/2 and as the stump levels may not always 
be the best strategy. The p-values to the right of d° may not be small enough to be well 
approximated by for small m. One can deal with this issue by using a more adaptive 
approach which keeps the stump-levels unspecified and estimates them from the data. For 
example, in the dose-response setting, one can define, 

n 

(a m ,n, P m ,n, d m ,n) = arg min V] {p m , n ( x i) ~ a l i x i <d) - fil(Xi> d)} 2 . 

(a,/3,d)e[o,i] 3 rr* 

1=1 

Please see pages 5 and 16 in Supplementary Material 1 for more details on this estimator. 

4. Concluding Discussion 



We briefly discuss a few issues, some of which constitute ongoing and future work on this 
topic. While we have developed a novel methodology for threshold estimation and estab- 
lished consistency properties rigorously, a pertinent question that remains to be addressed is 
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the construction of confidence intervals for d°. A natural way to approach this problem is 
to consider the limit distribution of our estimators for the two settings and use the quantiles 
of the limit distribution to build asymptotically valid confidence intervals. This is expected 
to be a highly non-trivial problem involving hard non-standard asymptotics. The rate of 
convergence crucially depends on the order of the cusp, p, at d°. As mentioned earlier, the 
minimax rate for this problem is jV -1 /( 2p+1 ) as per iRaimondol (11998b. This is in dis agree- 



Neumannl ( 19971) for a 



ment with the faster rates min{N~ 2 ^ 2p+ ' i \ N" 1 ^ 2p+1 ^) obtained in 
change-point estimation problem in a density deconvolution model. There are recent results 
( Goldenshluger et al. . 2006 . 12008 ) which suggest that N eumann's rate sho uld be optimal, but 



an asymptotic equ ivalence be t ween the density model in Neumannl (|1997|) and the regression 
model assumed in IRaimondol (| 19981) and our paper has not been formally established. Based 



on preliminary calculations, it is expected that our procedure will, at least, attain a rate of 
N~ l '( 2p+1 \ under optimal allocation between m and n for Method 1 and for a suitable choice 
of bandwidth for Method 2. 

In this paper, we have restricted ourselves to a univariate regression setup. Our approach 
can potentially be generalized to identify the baseline region, the set on which the function 
stays at its minimum, in multi-dimensional covariate spaces. This is a special case of level 
sets estimation, a problem of considerable interest in statistics and engineering. The p- values, 
constructed analogously, will continue to exhibit a limiting dichotomous behavior which can 



be exploited to construct estimates of the base 



the derivative of a certain order of /i (IMuelleii ll992l : lRaimondo . 



i ne region. Procedures th at look for a jump in 



1998) do not have natural 



extensions to high dimensional settings as the order of differentiability can vary from point 
to point on the boundary of the baseline region. 
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Appendix 

Proofs. We start with establishing an auxiliary result used in the subsequent developments. 

Theorem 4. Let T be an indexing set and {M^ : r £ T}^ =1 a family of real-valued sto- 
chastic processes indexed by h £ %. Also, let {M T : r £ T} £>e a family of deterministic 
functions defined on T-L, such that each M T is maximized at a unique point h(r) £ H. Here 
H is a metric space and denote the metric on % by d. Let h T n be a maximizer ofM.^. Assume 
further that: 

(a) sup rer sup hen \M T n (h) - M T {h)\ = o p (l), and 

(b) for every 7] > 0, c(rj) = inf T mi h ^ Bv {h(r)} [M T {h(r)} - M T {h)\ > 0, where B v (h) 
denotes the open ball of radius t] around h. 

Then, (i) sup T d{h T n , h(r)} = o p (l). Furthermore, if T is a metric space and h(r) is 
continuous in t, then (ii) — h(ro) = o p (l), provided r n converges to tq. In particular, if 
the M^s themselves are deterministic functions, the conclusions of the theorem hold with the 
convergence in probability in (i) and (ii) replaced by usual non- stochastic convergence. 

Proof. We provide the proof in the case when H is a sub-interval of the real line, the case 
that is relevant for our applications. However, there is no essential difference in generalizing 
the argument to metric spaces - euclidean distances simply need to be replaced by the metric 
space distance and open intervals by open balls. 

Given rj > 0, we need to deal with P* {sup rg7 - \h T n — h{r)\ > r/}, where P* 
is the outer probability. The event A rhTI = {sup r67 - \h T n — h(r)\ > 77} implies 
that for some r, h T n £ (h(r) - r),h(r) + 77) and therefore M T {h(r)} - M T (h 1 n ) > 
mi hmT y vMT)+v) [M T {h(r)} - M T {h)] . This is equivalent to M T {h(r)} - M T {h T n ) + 
mK) - Ml{h(r)} > utfww-nMrHi) [M T {h{r)} - M^h)] + M^K) - M^t)} . 
Now, W^(fiQ — M^{/i(r)} > and the left side of the above inequality is bounded above 
by 

2 || Mj; - M T \\ H = 2 sup \M T n (h) - M T (h)\ , 
hen 

implying that 2\\M T n - M T \\ n > mf H{h{T y vMT)+v) [M T {h(r)} - M T (h)] which, in turn, 
implies that 2 sup reT \\M T n - M T \\ n > inf rer mf mi(T) _ vMT)+v) [M T {h(r)} - M T (h)] = 
c(rj) by definition. Hence A n ^ C {sup rgT ||M^ — A/ r ||^ > c(r])/2}. By assumptions (a) 
and (b), P* {sup T6T ||M£ - M T \\ H > c(rj) /2} goes to and therefore so does P*(A UtV ). □ 
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Remark 4. We will call the sequence of steps involved in deducing the inclusion: 

{sup \h T n - h(r)\ > 77 1 C {sup \\Ml - M T \\ n > c( V )/2^ , 

as generic steps. Very similar steps will be required again in the proofs of the theorems to 
follow. We will not elaborate those arguments, but refer back to the generic steps in such 
cases. 

of Theorem^ To exhibit the dependence on the baseline value To (or its estimate), we use 
notations of the form M n (d, r ) and d min (T ). For convenience, let T^^Xi) = m 1 / 2 (F i . — 
t ) and Z im (r ) = iv in (Xj, r ) = 1 — ${T( m )(Xj)}. As m changes, the distribution of 
Zim(T ) changes, and so we effectively have a triangular array {(Xi, Z im (To))}™ =1 ~ P m , 
say. Using empirical process notation, M mjn (<i, r ) = P njm {Z lm (r ) — l/A}l(Xi < d), 
where P n m denotes the empirical measure of the data. Firstly, we find the limiting process 
for M m , in (d,T ). Define M m {d) = P m {Z lm (T ) - l/4}l(Xi < d) where M m {d) can be 
simplified as 

M m (d) = [ {u m (x) - l/A}f(x)dx, (6) 

where v m {x) = E{Z im (T Q ) \ Xi = x}. Observe that for X^ = x, as m — > oo, (X) 
converges in distribution to N(0, a 2 (x)) for x < d° and T^ m \x) = m l l 2 {Yi. — yu(x)} + 
m 1 / 2 {/i(x) — r } — >oo, in probability, for x > d°. Thus, u m (x) — > u(x) for all x G [0, 1], 
where v(x) = (l/2)l(x < d°). Let M(d) be the same expression for M m (d) in © 
with v m {x) replaced by u(x), e.g., M(d) = j^{v{x) — 1/A}f(x)dx. Observe that for 
c = (1/4) jf f(x)dx, M(d) < c for all d, and M(d°) = c. Also, it is easy to see that 
d° is the unique maximizer of M(d). Now, the difference \M m (d) — M(d)\, can be bounded 
by \v m (x) — v(x)\f(x)dx which goes to by the dominated convergence theorem. As the 
bound does not depend on d, we get \\M m — M]]^ — > 0, where || ■ ||oo denotes the supremum. 
By Theorem HI d m = argmax^gp,!] M m (d) — > argmaxd e [ ,i] M(d) = d° as m — )■ oo. It 
would now suffice to show that {d miri (f) — d m } is o p (l). 

Fix e > and consider the event {\d mtn (f) — d m \ > e}. Since d m maximizes M m and 
dm,n(T) maximizes M m n (-, f), by arguments analogous to the generic steps in the proof of 
Theorem HI we have: 

\d m>n (r) -d m \>e^ ||M m ,„(-,f) - M m (-)||oo > Vm(e)/2, 

where r) m (e) = mi de{dm _ ttdjn+t) c{M m (d m ) - M m (d)}. 
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We claim that there exists 77 > and an integer M such that r) m (e) > r] > for all 
m > M . To see this, let us bound M m (d m ) — M m (d) below by —2\\M m — M\\ 00 + M(d m ) — 
M(d). As \\M m — M\\oo as m — > 00, it is enough to show that there exists 77 > 
such that for all sufficiently large m, ini d( z( dm _ etdm+e )c{M(d m ) — M(d)} > r/. We split 
M(d m ) - M(d) into two parts as {M(d°) - M(d)} + {M(d rn ) - M (d )}. Notice that by 
the continuity of M(-), the second term goes to 0. To handle the first term, notice that M(d) 
is a continuous function with a unique maximum at d°. There exists M £ N such that for 
all m > M , we have (d° - e/2, d° + e/2) C (d m -e,d m + e) as d m ->■ d°. So, for m > M , 
mf d£{dm _ e4m+e)c {M(d°) - M(d)} > inf de( ,o_ e/2id o +e/2)c {M(rf ) - M(d)}. As M(d°) - 
M(d) is continuous, this infimum is attained in the compact set [0, 1] R — e/2, rf° + e/2) c 
and is strictly positive. Thus, a positive choice for rj, as claimed, is available. 
The claim yields, 



P m {\dm,n{^) ~ d rn\ > e} 

< P m {||M m , n (-,f)-M m , n (-,r ) 



For the first term, notice that, 



> 77/4} + P m {sup 

l>n 



(7) 



< max i<ri \Z im (f) 



To 



This is bounded above by sup ugR |$ («) — $ {n + y/m{f 



sup u6R |$(n) -$(n + a)| = 2$ (|a|/2) - 1, for a e R, ||M min (-,f 
is bounded by {2$ [m}^ 2 \f — r \/2) — 1}, which goes in probability to zero. 

To show that the last term in £7J) goes to zero, consider the class of functions T = 
{f d (x,z) = (;z — l/4)l(a; < d)\d £ [0, 1]} with the envelope F(x, z) = 1. The class J 7 is 
formed by multiplying a fixed function z ^ ( z — 1/4) with a bounded Vapnik-Chervonenkis 
classes of functions {l(x < d) : < d < 1| and therefore satisfies the entropy condi 
tion in the third display on page 168 of 



r )}\. As 

[771,71 ("; To) II 00 



satisfies the conditions of Theorem 2.8.1 of 



van der Vaart & Wellner 



van der Vaart & Wellnerl (11996b . It follows that T 

(119961) and is therefore 



uniformly Glivenko-Cantelli for the class of probability measures {P m }, i.e., 



sup P m {sup 

m>l n>k 



T ) ~ M n 



for every e > as k — > 00. Thus, we get P{\d m , n (^) 
completes the proof of the theorem. 



00 > e} ->■ 

dm\ > e } — > as m, n — )■ 00 . This 

□ 



Recall that T(x,r ) = {nh n ) l l 2 {fi(x) — t }. The following standard result from non- 
parametric regression theory is us eful in p r oving Theorem[2] The proof follows, for example, 



from the results in Section 2.2 of 



Bierens 



(1987) 
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Lemma 1. Assume that /i(-) and cr 2 (-) is continuous on [0, 1]. is continuous on [0,1]. We 
then have: 

(i) For < x, y < d° and x ^ y, 

T(x,r ) \ (( \ / K^(x)/{mf(x)} 

T(y,r ) J VW'V K^(y)/{mf(y)} 

in distribution. 

(ii) Ford < z < 1, T(z, To)— too in probability. 

Proof of Theorem^ Let v(x) and M(d) be as defined in proof of Theorem[H e.g., u(x) = 
(l/2)l(x < d°). For notational convenience, let Z;(r ) = p n (Xi) = 1 - ${T(X;,r )}. 
We eventually show that ||M n (-, f) — M(-)||oo converges to in probability and then apply 
argmax continuous mapping theorem to prove consistency. By calculations similar to those 
in the proof of Theorem [B ||M n (-,f) -M n (-,r )|| < {2$ ((n/^) 1 / 2 ^ - r \/2) - 1}, which 
converges to in probability. So, it suffices to show that ||M n (-, r ) — M^)]]^ converges to 
in probability. We first establish marginal convergence. We have 




E[${T{Xi,t )}\X 1 =x] (8) 
F L / (^n)- 1/2 [{/x(x) - r + €1 }K(0) + Et2g ~ tq)K {K\x - X,)}] y 

[ i (^)- i [^(o)+Er= 2 ^{^ 1 ^-^)}] j. ' 

The first term, both in the numerator and the denominator of the argument, is asymp- 
totically negligible and thus, the expression in ® equals E[${T(x, r ) + o p (l)}]. 
Using Lemma [1} this converges to 1 — v(x), by definition of weak convergence. As 

Z t (r ) = 1 - <t>{T(X t , r )}, we get E {M n (d, r )} = E[E {Z 1 (t ) - O^}!^ < d)^] 
which converges to M(d). Further, var{M n (d, r )} = n _1 var [{Zi(t ) -0.25}l(Xi < d)] + 
n~ l (n - l)cov [{Zi(r ) - 0.25}l(Xi < d), {Z 2 (r ) - 0.25}1(X 2 < d)} . The 
first term in this expression goes to zero as |Zi(r )| < 1. For y ^ 
x, by calculations similar to ©, E {Z 1 (t )Z 2 (t )\ X 1 = x,X 2 = y} = 
E [$ {T(x, r ) + o p (l)} $ {T(y, r ) + o p (l)}]. Using Lemma E T(>, r ) and T(y, r ) 
are asymptotically independent. Thus, by taking iterated expectations, it can be shown that 
cov [{Zi(t ) - 0.25}l(Xx < d), {Z 2 (r ) - 0.25}1(X 2 < d)} -> 0. This justifies pointwise 
convergence, e.g., M n (d, f ) - M(d) = o p (l), for d e [0, 1]. Further, as \Zi{f) - 1/4| < 1, 
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for di < d < d 2 , we have 



E [\{Mn(d, T ) - Mn(di, r )}{M n (d 2 , r ) - Mn(d, T 



< E 



n i=i ) l n t=i 



The above two terms, under expectation, are independent and thus, the expression is bounded 



(d 2 — di) 2 . As / is continuous on [0, 1], 



< oo. Thus, 



by||/ll„(d-di)(d2-d)< 
the processes {M„(-, r )} n >i are tight in D[0, 1] using Theorem 15.6 in iBillingsleyl (| 1968b . 

So, M„(-,t ) converges weakly to M as processes in D[0, 1). As the limiting process is 

degenerate and the map x(-) H- sup d6 [ ,i] \ x (d) \ is continuous, by continuous mapping, we 

get ||M n (-, r ) — M(-) || converges in probability to zero. As d°isthe unique maximizer of the 

continuous function M(-) and d„.(f) is tight as d n (f ) G [0, 1]. Hence, by argmax continuous 

mapping theorem in Ivan der Vaart & Wellnerl (|1996t ). we get the result. □ 
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